Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unique constraints to the email and username fields in the galaxy_user table #18493

Merged
merged 13 commits into from
Sep 24, 2024

Conversation

jdavcs
Copy link
Member

@jdavcs jdavcs commented Jul 4, 2024

Ref #18487


BEFORE MERGING:


Migrations that add unique constraints to email and username fields in the galaxy_user table + model definition update.
Unique constraints are implemented with indexes in both postgres and sqlite, so the 2 existing indexes are no longer needed - which is why the migration drops them.

The model definitions for both tables contain both index=True and unique=True: that's not a problem. Having unique=True would be sufficient, but this, I think, makes it a little easier to understand (maybe?). I've tested manually - in any case a b-tree index is created with a unique constraint.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@jdavcs jdavcs added kind/enhancement area/database Galaxy's database or data access layer labels Jul 4, 2024
@jdavcs jdavcs added this to the 24.2 milestone Jul 4, 2024
@jdavcs jdavcs marked this pull request as draft July 4, 2024 01:59
@jdavcs
Copy link
Member Author

jdavcs commented Jul 4, 2024

Failures are relevant: sqlite and postgres create different db objects based on the column specification index=True unique=True, which is why our migration tests fail. One more cross-database gotcha..


def upgrade():
with transaction():
create_unique_constraint(constraint_name, table_name, [column_name])
Copy link
Member

@mvdbeek mvdbeek Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to fail without #18492. I would pull that into the migration. Once the constraint is present we'll never need the script, so I don't see why we have it live as a separate script.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thinking was that we could run that script on the current release, which would deduplicate the usernames and, with that, stop the errors. However, the more I thought about it, I became convinced that it shouldn't be merged into a stable release: the script is a feature and belongs in dev. So yes, I'll move it into the same PR.

That said, I'm not sure we should be combining database schema revisions with data migrations in the same revision script. I think it may be better to have a script and reference it in the db migration module as an "upgrade note" (like I did here) + in the release admin notes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be pragmatic here and not provide a migration that can fail, which is going to be a major pain for admins. This is not a huge amount of data that needs to be altered, we're talking a bit more than 200 users on main.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I suppose we can distinguish between (a) a data migration that addresses business logic needs and (b) a data migration that fixes inconsistent data to enable the db schema migration; and combine the latter with the db schema migration into one revision script. We should then mention such cases in the release admin notes to alert admins that some data is being changed (adding a label now).

@jdavcs jdavcs added the highlight/admin Included in admin/dev release notes label Jul 5, 2024
@jdavcs jdavcs force-pushed the dev_user_db_fields branch 4 times, most recently from c57cb7d to 48bf21f Compare September 19, 2024 02:49
@jdavcs
Copy link
Member Author

jdavcs commented Sep 19, 2024

Implementation note on indexes and unique constraints
In both postgresql and sqlite, a unique constraint is implemented as an index. When the model's column definition contains both index=True and unique=True, only one index will be created. Therefore, attempting to create or drop both the index and the unique constraint in the migration script will result in an error. To handle this, we simply use create_index, passing unique=True as a keyword argument.

If we are adding a unique constraint to a column that had an index defined on it, we must drop that index, because it did not enforce a unique constraint. However, only existing databases will contain that index, new databases will not - so we must check for its existence in the migration script before issuing the drop_index command.

@jdavcs
Copy link
Member Author

jdavcs commented Sep 19, 2024

unit test failures relevant: the tests don't respect the new constraints. I'll fix this.
UPDATE: fixed.

@jdavcs jdavcs marked this pull request as ready for review September 21, 2024 04:18
@jdavcs
Copy link
Member Author

jdavcs commented Sep 21, 2024

So this has a new kind of test: individual migration test. The purpose is to test how existing data (that would prevent the execution of a particular migration) is fixed in the database during the migration process. For example, let's say we are adding a unique constraint to a field that currently contains duplicate values. We must cleanup the data before we can run this migration, so we have a script that handles this cleanup (by deleting duplicates, or deduplicating them, etc.). This test verifies such a script.

Here's what the test does:

  1. Load and initialize a new database and migration environment
  2. Downgrade to the revision BEFORE the migration under test
  3. Load and verify bad data (that would have prevented running the next revision script)
  4. Run the migration (via alembic)
  5. Verify that data has been cleaned up as expected.

@jdavcs
Copy link
Member Author

jdavcs commented Sep 23, 2024

Integration test failures unrelated.

@jdavcs jdavcs merged commit e49c21a into galaxyproject:dev Sep 24, 2024
52 of 56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/database Galaxy's database or data access layer highlight/admin Included in admin/dev release notes kind/enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants